Semi-Supervised Learning for Natural Language Processing

نویسندگان

  • John Blitzer
  • Xiaojin Zhu
چکیده

The amount of unlabeled linguistic data available to us is much larger and growing much faster than the amount of labeled data. Semi-supervised learning algorithms combine unlabeled data with a small labeled training set to train better models. This tutorial emphasizes practical applications of semisupervised learning; we treat semi-supervised learning methods as tools for building effective models from limited training data. An attendee will leave our tutorial with 1. A basic knowledge of the most common classes of semi-supervised learning algorithms and where they have been used in NLP before. 2. The ability to decide which class will be useful in her research. 3. Suggestions against potential pitfalls in semisupervised learning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Special semi-supervised techniques for Natural Language Processing tasks

A labeled natural language corpus is often difficult, expensive or time-consuming to obtain as its construction requires expert human effort. On the other hand, unlabelled texts are available in abundance thanks to the World Wide Web. The importance of utilizing unlabeled data in machine learning systems is growing. Here, we investigate classic semi-supervised approaches and examine the potenti...

متن کامل

Semi-supervised Classification for Natural Language Processing

Semi-supervised classification is an interesting idea where classification models are learned from both labeled and unlabeled data. It has several advantages over supervised classification in natural language processing domain. For instance, supervised classification exploits only labeled data that are expensive, often difficult to get, inadequate in quantity, and require human experts for anno...

متن کامل

Scalable Graph-Based Learning Applied to Human Language Technology

Scalable Graph-Based Learning Applied to Human Language Technology Andrei Alexandrescu Chair of the Supervisory Committee: Associate Research Professor Katrin Kirchhoff Electrical Engineering Graph-based semi-supervised learning techniques have recently attracted increasing attention as a means to utilize unlabeled data in machine learning by placing data points in a similarity graph. However, ...

متن کامل

Invited Talk: Domain-adaptation of Natural Language Processing Tools for RE

Natural language processing tools like part-of-speech taggers and parsers are being used in a variety of applications involving natural language, including RE. Such tools, based on statistical models of language, are learnt via supervised machine learning algorithms from human-annotated data. Due to their dependence on annotated data, which is limited in size and genre, these models have a fall...

متن کامل

Tutorial on Inductive Semi-supervised Learning Methods: with Applicability to Natural Language Processing

Supervised machine learning methods which learn from labelled (or annotated) data are now widely used in many different areas of Computational Linguistics and Natural Language Processing. There are widespread data annotation endeavours but they face problems: there are a large number of languages and annotation is expensive, while at the same time raw text data is plentiful. Semi-supervised lea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008